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A RECURSIVE FEATURE ELIWIINATING METHOD BASED ON A SUPPORT 

VECTOR MACHINE 

BACKGROUND 

[00S1] A recursive feature eliminating metliod based on a support vector machine 

(SVM-RFE) is widely used in data intensive applications, such as disease genes 
selection, structured data mining, and unstructured data mining, etc. The SVM- 
RFE method may comprise: SVM training an input training data to classify the 
training data, wherein the training data may comprise a plurality of training 

> 10 samples corresponding to a group of features and class labels associated with 
each of the training samples; elimiriating at least one feature with a minimum 
ranking criterion from the group of features; and repeating the aforementioned 
SVM training and eliminating until the group becomes empty. The SVM-RFE may 
be used to rank the features, for example, to rank the genes that may cause a 
1 5 disease. Rounds of SVM training and eliminating are independent with each other. 

BRIEF DESCRIPTION OF THE DRAWINGS 

}002] The invention described herein is illustrated by way of example and not by 

way of limitation in the accompanying figures. For simplicity and clarity of 
illustration, elements illustrated in the figures are not necessarily drawn to scale. 
20 For example, the dimensions of some elements may be exaggerated relative to 
other elements for clarity. Further, where considered appropriate, reference labels 
have been repeated among the figures to indicate corresponding or analogous 
elements. 
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[0003] Fig. 1 illustrates an embodiment of a computing system applying a SVM- 

RFE method. 

[0004] Fig. 2 illustrates an embodiment of a SVM-RFE machine in the computing 

system of Fig. 1. 

[0005] Fig. 3 illustrates an embodiment of a SVM-RFE method; 

[0006] Fig. 4 illustrates an embodiment of a SVM training method involved in the 

SVM-RFE method of Fig. 3. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

[0007] The following description describes techniques for a recursive feature 

10 eliminating method based on a support vector machine. In the following 

description, numerous specific details such as logic implementations, pseudo- 
code, means to specify operands, resource partitioning/sharing/duplication 
implementations, types and interrelationships of system components, and logic 
partitioning/integration choices are set forth in order to provide a more thorough 
15 understanding of the current invention. However, the invention may be practiced 
without such specific details. In other instances, control structures, gate level 
circuits and full software instruction sequences have not been shown in detail in 
order not to obscure the invention. Those of ordinary skill in the art, with the 
included descriptions, will be able to implement appropriate functionality without 
20 undue experimentation. 
[0008] References in the specification to "one embodiment", "an embodiment' , "an 

example embodiment", etc., indicate that the embodiment described may include 
a particular feature, structure, or characteristic, but every embodiment may not 
necessarily include the particular feature, structure, or characteristic. Moreover, 
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such phrases are not necessarily referring to the same embodiment. Further, 
when a particular feature, structure, or characteristic is described in connection 
with an embodiment, it is submitted that it is within the knowledge of one skilled in 
the art to effect such feature, structure, or characteristic in connection with other 
5 embodiments whether or not explicitly described. 
[0009] Embodiments of the invention may be implemented in hardware, firmware, 

software, or any combination thereof. Embodiments of the invention may also be 
implemented as instructions stored on a machine-readable medium, that may be 
^ read and executed by one or more processors. A machine-readable medium may 
1 0 include any mechanism for storing or transmitting information in a form readable 
by a machine (e.g., a computing device). For example, a machine-readable 
medium may include read only memory (ROM); random access memory (RAM); 
magnetic disk storage media; optical' storage media; flash memory devices; 
electrical, optical, acoustical or other forms of propagated signals (e.g., carrier 
15 waves, infrared signals, digital signals, etc.) and others. 
[0010] Fig. 1 shows a computing system for implementing a recursive feature 

eliminating method based on a support vector machine (SVM-RFE). A non- 
exhausive list of examples for the computing system may include distributed 
computing systems, supercomputers, computing clusters, mainframe computers, 
20 mini-computers, client-server systems, persona! computers, workstations, servers, 
portable computers, laptop computers and other devices for transceiving and 
processing data. 

[001 1] In an embodiment, the computing system 1 may comprise one or more 

processors 10, memory 11, chipset 12, I/O device 13, BIOS firmware 14 and the 
25 like. The one or more processors 10 are communicatively coupled to various 

3 




components (e.g., the memory 11) via one or more buses such as a processor 
bus as depicted in Fig. 1 . The processors 10 may be implemented as an 
integrated circuit (IC) with one or more processing cores that may execute codes 
under a suitable architecture, for example, including Intel® Xeon™ MP 



5 architecture available from Intel Corporation of Santa Clara, California. 
[0012] In an embodiment, the memory 12 may store codes to be executed by the 

processor 10. In an embodiment, the memory 12 may store training data 110, 
SVM-RFE 1 1 1 and operation system (OS). 112. A non-exhausive list of examples 
; for.the memory 102 may comprise one or a combination of the following 
10 semiconductor devices, such as synchronous dynamic random access memory 
. (SDRAM) devices, RAMBUS dynamic random access memory (RDRAM) devices, 
double data rate (DDR) memory devices, static random access memory (SRAM), 
flash memory devices, and the like. 
[0013] In an embodiment, the chipset 12 may provide one or more communicative 

15 path among the processorlO, memory 1 1 and various components, such as the 
I/O device 13 and BIOS firmware 14. The chipset 12 may comprise a memory 
. controller hub 120, an input/output controller hub 121 and a firmware hub 122, 
[0014] In an embodiment, the memory controller hub 120 may provide a 

communication link to the processor bus that may connect with the processor 101 
20 and to a suitable device such as the memory 1 1 . The memory controller hub 1 20 
may couple with the I/O controller hub 121, that may provide an interface to the 
I/O devices 13 or peripheral components (not shown in Fig. 1 ) for the computing 
system 1 such as a keyboard and a mouse. A non-exhaustive list of examples for 
the I/O devices 13 may comprise a network card, a storage device, a camera, a 
25 blue-tooth, an antenna, and the like. The I/O controller hub 121 may further 
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provide communication link to a graphic controller and an audio controller (not 
shown in Fig, 1). The graphic controller may control the display of information on a 
display device and the audio controller may control the display of information on 
an audio device. 

[00€5] In an embodiment, the memory controller hub 120 may communicatively 

couple with a firmware hub 122 via the input/output controller hub 121 . The 
firmware hub 122 may couple with the BIOS firmware 14 that may store routines 
that the computing device 100 executes during system startup in order to initialize 
J the processors 10, chipset 12, and other components of the computing device 1 . 
10 Moreover, the BIOS firmware 14 may comprise routines or drivers that the 

computing device 1 may execute to communicate with one or more components 
of the computing device 1 , 

[0016] In an embodiment, the training data 110 may be input from a suitable 

devices, such as the I/O component 13, or the BIOS firmware. Examples for the 

15 training data 110 may comprise data collected for a feature selection/ranking task, 

such as gene expression data from a plurality of human beings or other species, 

or text data from web or other sources. The data format may be structured data, 

such as a database or table, or unstructured data, such as matrix or vector. The 

SVM-RFE 1 .1 1 may be implemented between the training data 110 and the 

20 operation system 112. In an embodiment, the operation system 112 may include, 

but not limited to, different versions of LINUX, Microsoft Windows™ Server 2003, 

and real time operating systems such as VxWorks™, etc. In an embodiment, the 

SVM-RFE 1 1 1 may implement operations of: SVM training the training data 110 

that corresponds to a group of features; eliminating at least one feature from the 

25 group according to a predetermined ranking criterion; and repeating the SVM 

5 . 
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training and feature eliminating until the number of features in the group reaches a 



rounds of SVM training and eliminating dependent with each other. The SVM-RFE 
111 may output a feature elimination history or a feature ranking list. 



structure of the aforementioned computing system 1. For example, the SVM-RFE 
1 1 1 may be implemented as an integrated circuit with various functional logics as 
depicted in Fig, 2. For another example, the memory 1 1 may further comprise a 
validation software (not show in Fig. 1 ) to validate the SVM-RFE classification by 
10 the SVM-RFE 111. More specifically, the validation software may determine 

whether a person has a disease by checking his/her gene expression with a gene 
ranking list output by the SVM-RFE 111. 

[001 8] An embodiment of the SVM-RFE 1 1 1 is shown in Fig. 2. As shown, the 

SVM-RFE 1 1 1 may comprise a decision logic 21, a SVM learning machine 22, a 
15 ranking criterion logic 23 and an eliminating logic 24. 

[001 9] In an embodiment, the training data 1 1 0 input to the SVM-RFE 1 1 1 may 

comprise a plurality of. training samples [x^, X2,..., xj corresponding to a group of 



features, wherein m represents the number of training samples. The training data 
may further comprise class labels associated with each of the training samples [y^, 

20 y2,-. - , ym]- In an embodiment, each of the training samples represents a vector of n 
dimensions, wherein each dimension corresponds with each feature, and each of 
the class labels has a number of values. For example, if the training data is gene 
data collected from a plurality of persons, each of the training samples represents 
a -pattern of n gene expression coefficients for one person, and each of the class 

25 labels has two values (i.e., [1 , -1]) to represent two-class classification of its ^ 



predetermined value, for example 



le, until the group becomes empty, wherein the 



[0057] 



Other embodiments may implement other modifications or variations to the 
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associated training sample, e.g., whether the person has a certain decease (yj = 1) 



is empty and output a feature ranking list or feature elimination history if so. 
. 5 However, if the group is not empty, the SVM learning machine 22 may train the 
training data by setting a normal to a hyperplane where the training data may be 
mapped to leave the largest possible margin on either side of the normal. The 
SVM learning machine 22 may comprise a linear SVM learning machine and non- 
j linear SVM learning machine. In an embodiment for linear SVM learning machine, 

10 a normal may comprise a vector {^) representing a linear combination of the 
training data. For non-linear SVM learnirig machine, a normal may comprise a 

vector representing a non-linear combination of the training data. Each 
component of the vector represents a weight for each feature in the group of 
features. 

[Oazi] In an embodiment, the rankirig criterion logic 23 may compute a 

-> 

! predetermined ranking criterion for each feature based upon the weight vector ^ . 
The eliminating logic 27 may eliminate at least one feature with a certain ranking 
criterion from the group of features, for example, the at least one feature with a 
minimum or maximum ranking criterion in the group of features. Then, the 

20 decision logic 21 may determine whether the group becomes empty. If not, then in 
another round of SVM training and feature eliminating, the SVM learning machine 
22 will retrain the training data corresponding to the group of features without the 
eliminated ones, the ranking criterion logic 23 and eliminating logic 24 may 



or not (yj = -1 ). 



[0020] 



In an embodiment, the decision logic 21 may determine whether the group 
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compute the predetermined ranking criterion for each features in the group and 
eliminate at least one features with a minimum ranking criterion from the group of 
features. The SVM-RFE 1 1 1 may repeat the rounds of SVM training and feature 
eliminating as described above until the group becomes empty. 
[00^] In an embodiment, the SVM learning machine 22 may comprise a kernel 

data logic 220, a buffer 221, a Lagrange multiplier logic 222 and a weight logic 
223. In a first round of SVM training, the kernel data logic 22 may compute the 
kernel data based on the training data corresponding to the group of features and 
store the kernel data in the buffer 22 and then in each round of SVM training later, 
10 the kernel data logic 220 may retrieve a kernel data from the buffer 23, update the 
kernel data based on a part of the training data corresponding to the at least one 
feature that may be eliminated in a previous round and store the updated kernel 
data in the buffer in place of the old one. 
[0023] In an embodiment, the Lagrange multiplier logic 222 may compute a 

15 Lagrange multiplier a-, for each of the training samples by utilizing the kernel data 
output from the kernel data logic 220 and the weight logic 224 may obtain a 
weight w,, for each feature in the group of features, wherein i is an integer in a 
range of [1, the number of training samples], and k is an integer in a range of [1, 
the number of features]. 
[Oam] Fig. 3 depicts an embodiment of a SVM-RFE method that may be 

implemented by the SVM-RFE 111. 
[0025] As depicted, the SVM-RFE 1 1 1 may input the training data 1 1 0 in block 

301. In an embodiment, the training data may comprise a plurality of training 
samples [x,, Xj,... . xj, wherein m represents the number of training samples. The 

25 training data may further comprise class labels associated with each of the 
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training samples [y^, ya, .., ym]- Each of the training samples may represent a 
vector of n dimensions, wherein each dimension corresponds to each feature in a 
group of features (hereinafter, the group is labeled as group G), and each of class 
labels has a number of values to represent the class that its associated training 
5 sample belongs to. 
[0026] In block 302, the decision logic 21 of SVM-RFE 1 1 1 may determine 

whether the number of features in the group G is zero (block 301 ). If the number 
of features in the group G is greater than zero, then the SVM learning machine 22 

, of SVM-RFE 1 1 1 may train the training data corresponding to the features in the ■ 

— > 

1 0 group G, so as to obtain a vector (<^ ) for the training data (block 303). Each 
component of the weight vector represents a weight (e.g., weight (cjoJ) for a 
feature (e.g., the k*^ feature) in the group G. 
[0027] Then, the ranking criterion logic 23 may compute a ranking criterion for 

each feature in the group G based on its weight in block 304. In an embodiment, 
15 the ranking criterion is a square of the weight, e.g., Cj, = (ooj^, wherein c,^ 

represents the ranking criterion for the k*^ feature. However, in other embodiments, 
) the ranking criterion may be obtained in other ways, 

[0028] In block 305, the eliminating logic 24 may eliminate at least one feature 

with a certain ranking criterion from the group G. In an embodiment, the at least 
20 one feature (e.g., the k^^ feature) may correspond to the ranking criterion (e.g., c^ = 
(u)J^) that is the minimum in the group G. In another embodiment, the at least one 
feature may correspond to the ranking criterion that is the maximum in the group 
G. In other embodiments, the at least one feature may be eliminated in other ways. 
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[0029] In block 306, the eliminating logic 24 of the SVM-RFE 1 11 or other suitalple 

logics may optionally update the training data by removing a part of the training 
data that corresponds to the eliminated features. In an embodiment that the input 
training data may comprise m training samples and m class labels associated with 
5 the training samples, and each of the training samples is a vector of n dimensions 
wherein each dimension corresponds to each feature of the group G, the updated 
training data may comprise m training samples and m class labels associated with 
the training samples, and each of the training samples is a vector of (n-p) 
dimensions wherein (n-p) represents the number of the features in the group G 
1 0 after p features may be eliminated in block 305. 
[0030] In block 307, the eliminating logic 24 of the SVM-RFE 1 1 1 or other suitable 

logics may record the eliminating history, or record the feature ranking list based 
on the eliminating history. In an embodiment, the at least one features eliminated 
in. block 305 may be listed as a least important feature in the feature ranking list. 
15 In another embodiment, the at least features may be listed as a most important 
feature in the feature ranking list. 
'0031 ] Then, the decision logic 21 of the SVM-RFE 1 1 1 may continue to 

determine whether the number of features in the group G is zero in block 302. If 
not, the round of SVM training and feature eliminating as described with reference 
20 to blocks 303-307 may be repeated until the group G is determined to be empty, 
namely, the number of features therein is zero. 
[0032] If the decision logic 21 determines the number of features in the group G is 

zero in block 302, then the decision logic 21 or other suitable logics of SVM-RFE 
1 1 1 may output the eliminating history or the feature ranking list. 
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[0033] Fig. 4 depicts an embodiment of SVM training implemented by the SVM 

learning machine 22 in block 303 of Fig. 3, In the embodiment, blocks depicted in 
Fig. 4 may be implemented in each round of SVM training and feature elimination. 

[0034] As depicted, the kernel data logic 220 of the SVM learning machine or 

5 other suitable logics may determine whether it is the first round of SVM training for 
the training data 110 (block 401). This determination may be accomplished by 
setting a count number. If it is the first round of SVM training, then the kernel data 
logic 220 may compute a kernel data based on the training data 1 10 in block 402. 

; In an embodiment for linear SVM training, the kernel data may be computed by 
1 0 the following equations (1 ) and (2): 



K 



roundl 



r roundl , roundl 

^1.1 \m 
J roundl 



f roundl j roundl 



(1) 



. roundl T » 

Kj ^j=2^^ik^jk (2) 

Wherein, r is the kernel data of a matrix with ' components k," , m 

r 

; * represents the number of training samples, represents a transpose of 

15 training sample that is a vector of n components, represents -^'^ training 
sample that is another vector of n components, n represents the number of 
features in the group G. Other embodiments may implement other modifications 
and variations to block 406. For example, for non-linear SVM training, the kernel 
data may be obtained in a different way, e.g., the Gaussian RBF kernel: 

20 >k,..-%e+-"^lV2^' (3). 
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[0035] "'"h®", the kernel data logic 220 stores the kernel data in the buffer 221 of 

the SVM learning machine 22 in block 403. The Lagrange multiplier logic 222 may 
compute a Lagrange multiplier matrix based upon the kernel data in blocks 408- 
41 2 and the weight logic 223 may compute a weight vector based on the 
5 Lagrange multiplier matrix in block 414. With these implementations, the first 
round of SVM training for the training data 1 10 is completed. 
[0036] However, if the kernel data logic 220 or other suitable logics determines 

that it is not the first round of SVM training for the training data 1 1 0 in block 401 , 
J then in block 404, the kernel data logic 220 or other suitable logics may input the 
10 at least one feature eliminated in a previous round of feature elimination 

implemented in block 305 of Fig. 3. For example, if it is q'^ round of SVM training 
(q>1), then the kernel data logic or other suitable logics may input the at least one 
feature eliminated. in a (q-l)*" round of feature elimination (e.g., the p*" feature that 
is eliminated from the group of n features in the (q-1 round of feature 
15 elimination). Then, the kernel data logic 220 may retrieve the kernel data stored in 
the buffer 221 in a previous round of SVM training (block 405), and update the 
kernel data based on a part of the training data corresponding to the at least one 
eliminated feature (block 406). In an embodiment for linear SVM training, the 
kernel data may be updated by the following equations (4) and (5): 

round(q) , round(q) 

, round(q) 

-^u 

I round(q) , round^q) 



20 j^round(q) _ 



m.m 



(4) 



, roundiq) _ , roundiq-l) 

^ij -^iJ "^ip^iP . , (5) 
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^ round(q) 

wherein, ^ represents a component of the kernel data K in q*'' round of S VM 

^ round (q-\) 

training, « represents a component of the kernel data K in a (q-1 round of 

SVIVI training, represents the i*" training sample with p'" feature that is 

eliminated in (q-l)"" round of feature elimination, ^-s" represents the j* training 
5 sample with p* feature that is eliminated in (q-1 )* round of feature elimination. 
[0037] Other embodiments may implement other modifications and variations to 

block 406. For example, for non-linear SVM training, the kernel data may be 
updated in a different way, e.g., for the Gaussian RBF kernel, a component for the 
kernel data K in q*" round may be updated by 

10 (QJ 

[0038] Then, in block 407, the kernel data logic 220 may replace the kernel data in 

the buffer 221 with the updated kernel data obtained in block 406. The Lagrange 
multiplier logic 222 may compute a Lagrange multiplier matrix based on the kernel 
data in blocks 408-412 and the weight logic 223 may compute a weight vector 
15 based on the. Lagrange multiplier matrix in block 414. With these implementations, 
the q'" round of SVM training is completed. 

[0039] More specifically, in block 408, the Lagrange multiplier logic 222 may 

initialize a Lagrange multiplier matrix « in each round of SVM training, wherein 

each component of the « matrix represents a Lagrange multiplier (e.g. ) 
20 corresponding to a training sample Xj. In an embodiment, the initialization of the 
Lagrange multiplier matrix may be implemented by setting a predetermined value 
(e.g., zero) to each component of the Lagrange multiplier matrix. 

13 



[0040] Then, in block 409, the Lagrange multiplier logic 222 may determine 

whether each of the Lagrange multipliers corresponding to each of the training 

samples (e.g., [^1,^2 ^- ]) fulfill the Karush-Kuhn-Tucker (KKT) conditions. 

More specifically, whether each of the Lagrange multipliers fulfills the following 

5 five conditions: 

d « 
— L(w, b, a) = w„ -J^a,y,x^ 
1. .=1 v = l, ,n 

—L{w,b,a) = -Y^a.y, = 0 

2 ob 

J 

3 y,(x,'W-b)-l>0 i = i^ 

4. or, > 0 V/ 

10 5 ^3r,te(x,->v-Z?)-l) = 0 

wherein, represents the weight for the feature, * represents a bias value, 
L(w,b,a) represents a Lagrangjan with and « as variables: 
1 

L(w,b,a) = -{w ■ w) - ■ X,) + b)-l] 

• (7) 



[0041] If not all of the Lagrange multipliers fulfill the KKT conditions, the Lagrange 

15 multiplier logic 222 may initialize an active set for two Lagrange multipliers in block 
410. In an embodiment, the initialization of the active set may be implemented by 
clearing a data fragment in a memory of the computing system to store, the active 
set. In other embodiments, the active set may be initialized in other ways. 

[0042] Then, in block 41 1 , the Lagrange multiplier logic 222 may select two 

20 Lagrange multipliers (e.g., and ^2 ) as an active set with heuristics, wherein the 

two Lagrange multiplier violates the KKT conditions with minimum errors (e.g., 

14 . ' 



errors and E2 respectively associated with the two Lagrange multipliers 

and ^2 ) under a predetermined constraint. In order to do that, the Lagrange 
multiplier logic 222 may obtain the errors associated with each of the Lagrange 

multipliers (e.g., ,^2,...., ^-j) by utilizing the kernel data stored in the buffer 
5 221 . In an embodiment for linear SVM training, the predetermined constraint may 

comprise 0- ^/ - ^ therein C is a predetermined value, and the error associated 
with each Lagrange multiplier may be obtained by the following equation and then 
stored in an error cache: 

} 

^y=(|:^.-xV^^^^ ^^^^ (8) 

E 

10 wherein, > represents an error associated with a Lagrange multiplier in q^^ 
round of SVM training, ^..''''""^^^^may be obtained from the kernel data stored in the 

buffer 221. Other embodiments may implement other modifications and variations 
to block 41 1, For example, the active set may comprise the number of Lagrange 
multipliers other than two. 
ya^] Then, in block 412, the Lagrange multiplier logic 222 may update the 

Lagrange multipliers in the active set by utilizing the kernel data K stored in the 
buffer 221- In an embodiment that the SVM learning machine is a linear learning 

machine and the active set may comprise two Lagrange multipliers (e.g., ^1 and 
^2 ), the Lagrange multiplers may be updated with the following equations: 
20 ar = a, ^.2^(5^, ^ ^ -k,, -k,,, E, ^tc^^yAr"^'' -yXy, (9) 
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new,clipped 



new 

«2 



// L<a,r'<H (10) 



L if <L 



L = max(0,a2-a,)_ ^ = min(C,C - a,) ^^^^ 

[0044] However, other embodiments may implement other modifications and 

variations to block 412. 
[0045] Then, in block 413, the Lagrange multiplier logic 222 may update the error 

10 cache by computing the errors associated with the updated Lagrange multipliers 

in the active set with the equation (8). 
[0046] . Then, the Lagrange multiplier logic 222 may continue to update other 

Lagrange multipliers in the Lagrange multiplier matrix in blocks 408-413, until all of 

the Lagrange multipliers in the matrix fulfill KKT conditions. 

[00^] Then, the weight logic 223 may compute the weight vector ) based on 

J the Lagrange multipliers obtained in blocks 408-413, wherein each component of 
the vector corresponds to each of the feature. In an embodiment for linear SVM 
training, weight for each feature may be obtained with the following equation: 

m 

(13) 

20 wherein, represents a weight for feature, m represent the number of the 
training samples, represents the training samples corresponding to the 
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^ feature. However, other embodiments may implement other modifications and 
variations to block 414. 

Although the present invention has been described in conjunction with 
certain embodiments, it shall be understood that modifications and variations may 
be resorted to without departing from the spirit and scope of the invention as those 
skilled in the art readily understand. Such modifications and variations are 
considered to be within the scope of the invention and the applended claims. 
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What is claimed is: 

1 . A method, comprising 

determining a value for each feature in a group of features provided by a 
training data; 

eliminating at least one feature from the group by utilizing the value for 

each feature in the groups- 
updating the value for each feature in the group based on a part of the 

training data that corresponds to the eliminated feature, 

2. The method of claim 1 , wherein the training data further comprises a 
plurality of training samples, each of tha training samples corresponding to the 
group of features. 

3. The method of claim 1 , wherein determining the value comprises: 
computing a kernel data based ori the training data; 

computing the value for each feature of the group based on the kernel data; 

and 

storing the kernel data in a buffer. 

4. The method of claim 3, wherein computing the kernel data further 
comprises computing a matrix as the kernel data, each component of the matrix 
comprising a dot product of two of training samples provided by the training data. 

5. The method of claims 1 , wherein updating the value further comprises: 

18 



f Q/CN 2{)05 / 0 0 1 2 4 

- 

retrieving a kernel data from a buffer; 

updating the kernel data based on the part of the training data that 
corresponds to the eliminated features; and 

updating the value for each feature of the group based on the updated 
kernel data. 

6. The method of claim 5, wherein updating the kernel data further 
comprises: 

subtracting a matrix from the kernel data, each component of the matrix 
comprising a dot product of two of training samples provided by the part of the 
training data. 

7. The method of claim 1 , wherein eliminating at least one feature 
comprises: 

computing a ranking criterion for each feature of the group based on the 
value for the each feature; 

eliminating the at least one feature with the minimum ranking criterion from 
the group; and 

recording the eliminated feature in a feature ranking jist. 

8. The method of claim 1 , further comprising: 

repeating of eliminating the at least one feature from the group and 
updating the value for each feature of the group until a number of features in the 
group reaches a predetermined value. 
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9. An apparatus, comprising: 

a training logic to determine a value for each feature in a group of features 
provided by a training data; and 

an eliminate logic to eliminate at least one feature from the group by 
5 utilizing the value for each feature in the group, 

wherein the training logic further updates the value for each feature in the 
group based on a part of the training data that corresponds to the eliminated 
feature. 

1 0. The apparatus of claim 9, wherein the training data comprises a 
plurality of training samples, each of the training samples having the group of 
features. 

1 1 . The apparatus of claim 9, further comprising; 

a decision logic to decide v\/hether to repeat the elimination of the at least 
one features from the group and update of the value for each feature of the group 
until a number of features in the group reaches a predetermined value. 

12. The apparatus of claim 9, wherein the training logic further comprises: 
a kernel data logic to compute a kernel data based upon the training data; 
a buffer to store a kernel data; 

a value logic to compute the value based on the kernel data. 

13. The apparatus of claim 12, wherein the kernel data logic further 
updates the kernel data in the buffer based on the part of the training data that 
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corresponds to the eliminated features, and the value logic further updates the 
value based upon the updated kernel data. 

14. The apparatus of claim 12, wherein the kernel data logic further 

5 subtracts a matrix from the kernel data, each component of the matrix comprising 
a dot product of two of training samples provided by the part of the training data. 

15. The apparatus of claim 9, wherein the eliminate logic further comprises 
a ranking criterion logic to compute a ranking criterion for each feature of the 

10 group based on the value for the each feature. 

16. The apparatus of claim 9, wherein the eliminate logic further comprises 
a feature eliminate logic to eliminate the at least one feature having the minimum 
ranking criterion from the group. 

15 

17. A machine-readable medium comprising a plurality of instructions, that 
in response to being executed, result in a computing system: 

determining a value for each feature in a group of features provided by a 
training data; 

20 eliminating at least one feature from the group by utilizing the value for 

each feature in the group; and 

updating the value for each feature in the group based on a part of the 
training data that corresponds to the eliminated feature. 
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18. The machine-readable medium of claim 17, wherein the training data 
further comprises a plurality of training samples, each of the training samples 
corresponding to the group of features. 

1 9. The machine-readable of claim 17, wherein the plurality of instructions . 
that result in the computing system determining the value, further result in the 
computing system: 

computing a kernel data based on the training data; 

computing the value for each feature of the group based on the kernel data; 

and 

storing the kernel data in a buffer. 

20. The machine-readable of claim 19, wherein the plurality of instructions 
that result in the computing system computing the kernel data, further result in the 
computing system computing a matrix as the kernel data, each component of the 
matrix comprising a dot product of two of training samples provided by the training 
data. 

21. The machine-readable of claim 17, wherein. the plurality of instructions 
that result in the computing system updating the value, further result in the 
computing system: 

retrieving a kernel data from a buffer; 

updating the kernel data based on the part of the training data that 
corresponds to the eliminated feature; and 
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updating the value for each feature of the.group based on the updated 
kernel data. 

22. The machine-readable of claim 21 , wherein the plurality of instructions 
that result in the computing system updating the kernel data, further result in the 
computing system: 

subtracting a matrix from the kernel data, each component of the matrix 
comprising a dot product of two of training samples provided by the part of the 
training data that corresponds to the eliminated feature. 

23. The machine-readable of claim 17, wherein the plurality of instructions 
that result in the computing system eliminating at least one feature, further result 
in the computing system: 

computing a ranking criterion for each feature of the group based on the 
value for the each feature; 

eliminating the at least feature with the minimum ranking criterion from the 
group; and 

recording the eliminated feature in a feature ranking list. . 

24. The machine-readable of claim 17, wherein the plurality of instructions 
further result in the computing system: 

repeating of eliminating the at least feature from the group and updating the 
value for each feature of the group until a number of features in the group reaches 
a predetermined value. 
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Method, apparatus and system are described to perform a feature 
eliminating method based on a support vector machine. In some embodiments, a 
value for each feature in a group of features provided by a training data is 
determined. At least one feature is eliminated from the group by utilizing the value 
for each feature in the group. The value for each feature in the group is updated 
based upon a part of the training data that corresponds to the eliminated feature. 
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